AITopics

Country: North America > Canada > Quebec > Montreal (0.05)

Genre: Overview (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.70)

Łazęcka, Małgorzata, Szczurek, Ewa

Factor Analysis with Correlated Topic Model for Multi-Modal Data

arXiv.org Machine LearningApr-26-2025

Integrating various data modalities brings valuable insights into underlying phenomena. Multimodal factor analysis (FA) uncovers shared axes of variation underlying different simple data modalities, where each sample is represented by a vector of features. However, FA is not suited for structured data modalities, such as text or single cell sequencing data, where multiple data points are measured per each sample and exhibit a clustering structure. To overcome this challenge, we introduce FACTM, a novel, multi-view and multi-structure Bayesian model that combines FA with correlated topic modeling and is optimized using variational inference. Additionally, we introduce a method for rotating latent factors to enhance interpretability with respect to binary features. On text and video benchmarks as well as real-world music and COVID-19 datasets, we demonstrate that FACTM outperforms other methods in identifying clusters in structured data, and integrating them with simple modalities via the inference of shared, interpretable factors.

factor analysis, machine learning, natural language, (16 more...)

2504.18914

Country:

Asia > Middle East > Jordan (0.04)
Europe > Poland > Masovia Province > Warsaw (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (0.46)
Research Report > Promising Solution (0.46)

Industry:

Health & Medicine > Therapeutic Area > Immunology (0.66)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Neural Information Processing SystemsApr-6-2023, 15:26:41 GMT

Correlated Topic Models

Topic models, such as latent Dirichlet allocation (LDA), can be useful tools for the statistical analysis of document collections and other discrete data. The LDA model assumes that the words of each document arise from a mixture of topics, each of which is a distribution over the vocabulary. A limitation of LDA is the inability to model topic correlation even though, for example, a document about genetics is more likely to also be about disease than x-ray astronomy. This limitation stems from the use of the Dirichlet distribution to model the variability among the topic proportions. In this paper we develop the correlated topic model (CTM), where the topic proportions exhibit correlation via the logistic normal distribution [1].

correlated topic model, lda

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

arXiv.org Machine LearningJan-2-2021

A Multilayer Correlated Topic Model

Tian, Ye

We proposed a novel multilayer correlated topic model (MCTM) to analyze how the main ideas inherit and vary between a document and its different segments, which helps understand an article's structure. The variational expectation-maximization (EM) algorithm was derived to estimate the posterior and parameters in MCTM. We introduced two potential applications of MCTM, including the paragraph-level document analysis and market basket data analysis. The effectiveness of MCTM in understanding the document structure has been verified by the great predictive performance on held-out documents and intuitive visualization. We also showed that MCTM could successfully capture customers' popular shopping patterns in the market basket analysis.

mctm, paragraph, topic model, (16 more...)

2101.02028

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > New York (0.04)
North America > Canada (0.04)

Genre: Research Report (0.82)

Industry: Consumer Products & Services > Personal Products (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.75)

Oo, Mi Khine, Khine, May Aye

Topic Extraction of Crawled Documents Collection using Correlated Topic Model in MapReduce Framework

arXiv.org Machine LearningJan-6-2020

The tremendous increase in the amount of available research documents impels researchers to propose topic models to extract the latent semantic themes of a documents collection. However, how to extract the hidden topics of the documents collection has become a crucial task for many topic model applications. Moreover, conventional topic modeling approaches suffer from the scalability problem when the size of documents collection increases. In this paper, the Correlated Topic Model with variational Expectation-Maximization algorithm is implemented in MapReduce framework to solve the scalability problem. The proposed approach utilizes the dataset crawled from the public digital library. In addition, the full-texts of the crawled documents are analysed to enhance the accuracy of MapReduce CTM. The experiments are conducted to demonstrate the performance of the proposed algorithm. From the evaluation, the proposed approach has a comparable performance in terms of topic coherences with LDA implemented in MapReduce framework.

correlated topic model, dataset, mapreduce ctm, (12 more...)

2001.01669

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Myanmar > Yangon Region > Yangon (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
(4 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

arXiv.org Machine LearningJul-1-2017

Efficient Correlated Topic Modeling with Topic Embedding

He, Junxian, Hu, Zhiting, Berg-Kirkpatrick, Taylor, Huang, Ying, Xing, Eric P.

Correlated topic modeling has been limited to small model and problem sizes due to their high computational cost and poor scaling. In this paper, we propose a new model which learns compact topic embeddings and captures topic correlations through the closeness between the topic vectors. Our method enables efficient inference in the low-dimensional embedding space, reducing previous cubic or quadratic time complexity to linear w.r.t the topic size. We further speedup variational inference with a fast sampler to exploit sparsity of topic occurrence. Extensive experiments show that our approach is capable of handling model and data scales which are several orders of magnitude larger than existing correlation results, without sacrificing modeling quality by providing competitive or superior performance in document classification and retrieval.

artificial intelligence, machine learning, natural language, (20 more...)

1707.00206

Country:

Asia > Middle East (0.46)
North America > United States (0.28)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Yu, Xingchen, Fokoue, Ernest

Probit Normal Correlated Topic Models

arXiv.org Machine LearningOct-3-2014

The logistic normal distribution has recently been adapted via the transformation of multivariate Gaus- sian variables to model the topical distribution of documents in the presence of correlations among topics. In this paper, we propose a probit normal alternative approach to modelling correlated topical structures. Our use of the probit model in the context of topic discovery is novel, as many authors have so far con- centrated solely of the logistic model partly due to the formidable inefficiency of the multinomial probit model even in the case of very small topical spaces. We herein circumvent the inefficiency of multinomial probit estimation by using an adaptation of the diagonal orthant multinomial probit in the topic models context, resulting in the ability of our topic modelling scheme to handle corpuses with a large number of latent topics. An additional and very important benefit of our method lies in the fact that unlike with the logistic normal model whose non-conjugacy leads to the need for sophisticated sampling schemes, our ap- proach exploits the natural conjugacy inherent in the auxiliary formulation of the probit model to achieve greater simplicity. The application of our proposed scheme to a well known Associated Press corpus not only helps discover a large number of meaningful topics but also reveals the capturing of compellingly intuitive correlations among certain topics. Besides, our proposed approach lends itself to even further scalability thanks to various existing high performance algorithms and architectures capable of handling millions of documents.

machine learning, natural language, topic model, (18 more...)

1410.0908

Country:

North America (0.69)
Asia > Middle East (0.68)

Genre: Research Report (0.50)

Industry:

Government > Military (0.46)
Government > Foreign Policy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.89)

Virtanen, Seppo, Jia, Yangqing, Klami, Arto, Darrell, Trevor

Factorized Multi-Modal Topic Model

arXiv.org Machine LearningOct-16-2012

Multi-modal data collections, such as corpora of paired images and text snippets, require analysis methods beyond single-view component and topic models. For continuous observations the current dominant approach is based on extensions of canonical correlation analysis, factorizing the variation into components shared by the different modalities and those private to each of them. For count data, multiple variants of topic models attempting to tie the modalities together have been presented. All of these, however, lack the ability to learn components private to one modality, and consequently will try to force dependencies even between minimally correlating modalities. In this work we combine the two approaches by presenting a novel HDP-based topic model that automatically learns both shared and private topics. The model is shown to be especially useful for querying the contents of one domain given samples of the other.

artificial intelligence, machine learning, natural language, (19 more...)

1210.492

Country:

Europe (0.93)
Asia > Middle East (0.15)

Genre: Research Report (0.40)

Industry: Transportation (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Lafferty, John D., Blei, David M.

Correlated Topic Models

Neural Information Processing SystemsDec-31-2006

Topic models, such as latent Dirichlet allocation (LDA), can be useful tools for the statistical analysis of document collections and other discrete data. The LDA model assumes that the words of each document arise from a mixture of topics, each of which is a distribution over the vocabulary. A limitation of LDA is the inability to model topic correlation even though, for example, a document about genetics is more likely to also be about disease than x-ray astronomy. This limitation stems from the use of the Dirichlet distribution to model the variability among the topic proportions. In this paper we develop the correlated topic model (CTM), where the topic proportions exhibit correlation via the logistic normal distribution [1]. We derive a mean-field variational inference algorithm for approximate posterior inference in this model, which is complicated by the fact that the logistic normal is not conjugate to the multinomial. The CTM gives a better fit than LDA on a collection of OCRed articles from the journal Science. Furthermore, the CTM provides a natural way of visualizing and exploring this and other unstructured data sets.

correlation, equation, probability, (14 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(2 more...)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Lafferty, John D., Blei, David M.

Correlated Topic Models

Neural Information Processing SystemsDec-31-2006

Topic models, such as latent Dirichlet allocation (LDA), can be useful tools for the statistical analysis of document collections and other discrete data. The LDA model assumes that the words of each document arise from a mixture of topics, each of which is a distribution over the vocabulary. A limitation of LDA is the inability to model topic correlation even though, for example, a document about genetics is more likely to also be about disease than x-ray astronomy. This limitation stems from the use of the Dirichlet distribution to model the variability among the topic proportions. In this paper we develop the correlated topic model (CTM), where the topic proportions exhibit correlation via the logistic normal distribution [1]. We derive a mean-field variational inference algorithm for approximate posterior inference in this model, which is complicated by the fact that the logistic normal is not conjugate to the multinomial. The CTM gives a better fit than LDA on a collection of OCRed articles from the journal Science. Furthermore, the CTM provides a natural way of visualizing and exploring this and other unstructured data sets.

correlation, equation, probability, (14 more...)